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Abstract 

This paper describes the hardware, firmware and software systems used in data acquisition for the NA61/SHINE 
experiment at the CERN SPS accelerator. Special emphasis is given to the design parameters of the readout electronics 
for the 40 m^ volume Time Projection Chamber detectors, as these give the largest contribution to event data among 
all the subdetectors: events consisting of 8 bit ADC values from 256 time slices of 200k electronic channels are to be 
read out with ~100Hz rate. The data acquisition system is organized in “push-data mode”, i.e. local systems transmit 
data asynchronously. Techniques of solving subevent synchronization are also discussed. 
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1. Introduction 

NA61/SHINE is a large acceptance fixed target 
hadron spectrometer experiment at the SPS accelera¬ 
tor at CERN imEnmnia. The main tracking system, 
i.e. the bending magnets and the hve TPC detectors are 
inherited from the former experiment NA49 El. The 
physics program of NA61/SHINE is quite complex and 
covers the search for the critical point of strongly in¬ 
teracting matter, study the onset of deconhnement, the 
quantification of medium effects in strongly interacting 
matter, furthermore the measurement of hadron produc¬ 
tion spectra in hadron-nucleus collisions for cosmic ray 
and neutrino physics applications. These studies are 
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carried out using the SPS beamline which is able to pro¬ 
vide hadron beams in the 10-350 GeV/c, and ion beams 
in the 10-160 GeV/c/nucleon beam momentum range. 
The hadron beams are produced by the 400 GeV/c pro¬ 
ton beam of the SPS accelerator hitting a beryllium pro¬ 
duction target and are tagged with their particle type us¬ 
ing a differential Cerenkov detector trigger signal. Ion 
beams are either direct beams from SPS whenever com¬ 
patible with the accelerator schedule, or are fragmented 
from a lead beam of the SPS using a beryllium fragmen¬ 
tation target with subsequent element tagging using a 
threshold Cerenkov detector or via scintillator response 
amplitude. The beam rate reaching the detector is up 
to 100 kHz, being the safety limit, out of which about 5 
to 100% are the tagged useful beam particles with the 
selected type, depending on the actually used conhgu- 
ration. This beam hits a fixed target which is either liq¬ 
uid hydrogen or solid state material, depending on the 
reaction to be studied. The thickness of the target is ad¬ 
justed in such a way that the collision probability of the 
selected beam particle type with the target material is 
around 0.1-3% in order to limit the contribution of sec¬ 
ondary collisions within the target material. This setting 
provides ~l-3 kHz collision event rate of the right type 
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to be potentially recorded by the spectrometer down¬ 
stream of the target. 

The outline of the experimental setup is shown in 
Fig. [3 The ultrarelativistic particles produced in the 
collision within the target enter into the strong, ~0.1- 
1.5 Tesla, field of two superconducting bending magnets 
where the trajectories are deflected according to their 
momentum to charge ratio. The bending power of the 
magnet system is up to 9 Tm. Within and downstream 
of the bending magnets, five large volume TPC (Time 
Projection Chamber) detectors record the charged parti¬ 
cle trajectories in a 40 m^ tracking volume. A set of ToF 
(Time of Flight) detectors aid the particle identification. 
The most downstream detector of the experiment is the 
PSD (Projectile Spectator Detector), a calorimeter used 
to measure the energy fraction of the piece of projectile 
nucleus which did not take part in particle production, 
providing a geometric measure of collision centrality. 



Figure 1: The setup of the NA6 I/SHINE detector. A set of trigger 
and beam counters are placed on the beamline, followed by the fixed 
target. A large acceptance magnetic spectrometer setup with 40 m^ 
tracking volume is placed downstream of the target. The spectrometer 
setup consists of two large superconducting magnets and TPC (Time 
Projection Chamber) volumes. The setting is completed by a set of 
ToF (Time-of-Flight detectors) along with a calorimeter in the beam¬ 
line, called PSD (Projectile Spectator Detector). In some of the runs 
the target is surrounded by a special detector, called LMPD (Low Mo¬ 
mentum Particle Detector) (7), which is a small TPC chamber using 
the same readout as the large ones. 

Upon the start of the NA61/SFIINE program, several 
new detectors were added to the original NA49 tracking 
setup. In addition, the available readout rate not greater 
than ~ 10 Hz in NA49 was insufficient for fully exploit¬ 
ing the available beam rates, and was also insufficient 
for completing the data taking schedule in a timely man¬ 
ner with the available 3-5 month of beamtime per year. 
Therefore a decision was made to design and produce 
a new readout system for NA61/SHINE in order to be 
able to record events at a rate ~ 100 Hz, along with in¬ 
corporating new detectors into the readout chain. In this 
upgrade project the most elaborate part was a new so¬ 
lution for the TPC readout, as this detector component 


gives the largest contribution to the raw data. There¬ 
fore, this paper gives a special emphasis on the design 
requirements and description of the TPC readout part of 
the NA61 DAQ (Data AcQuisition) system. 

The paper is organized as follows. Section sum¬ 
marizes the requirements on the new TPC electronics 
as a main motivation. Section [3 describes the new TPC 
electronics, their firmware and working principle. Sec¬ 
tion describes the integration of further subdetectors 
into the readout chain. Section|^summarizes the signal 
and data flow lines within the experimental setup. Sec- 
tionj^describes the solutions for guaranteeing the event 
synchronization in the parallel data channels. Section]?] 
describes the online software, mainly the Central DAQ. 
Section]^outlines the structure of the raw data files writ¬ 
ten by Central DAQ. Section]^summarizes the observed 
performance. Section]^concludes the paper. 

2. Requirements on the TPC readout electronics 

The 40 m^ TPC chamber system produces the largest 
and most complicated subevent data of the experiment; 
for each triggered event the charge deposit on each of 
the 182784 readout pads need to be acquired throughout 
the 51.2jusec drift time in 256 or 512 samples, where 
the charge deposit is measured in 1 Byte ADC counts. 
This means 50 or 100 MByte per event in the 256 or 
512 time sampling mode, respectively. In normal data 
taking, the 256 timebin mode is used, the 512 timebin 
mode is needed only during short and rare periods of 
gain calibration runs with radioactive Krypton gas in 
the TPC volumes. During normal beam data taking 
50 MByte per event needs to be transported at a ~ 100 Hz 
event rate, which would result in ~5 GByte/sec data 
rate. However, the TPC detector data is zero dominated, 
since detection signal in the TPC volume only appears 
where particle trajectories traverse it. Therefore, it is 
evident that some kind of data compression method can 
and should be applied. 

The original NA49 PEE (Pront-End Electronics) 
cards of the TPC chambers 0 ]9l provided a minimal 
dead time of 11.7 msec when read out. This already 
allows 86 Hz readout rate without modification of the 
5712 pieces of FEE cards. A trade-off between the 
needed readout rate and the development time and costs 
motivated the re-usage of the NA49 PEE cards in the 
new system without modification, and the redesign of 
the readout electronics upstream of them. The elec¬ 
tronic units reading out the PEE cards are called Moth- 
erBoards. They perform the steering of the readout pro¬ 
cess of the PEEs, the data compression before trans¬ 
fer, and subsequent serial transmission of the processed 
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FEE data. Because of geometrical and data density con¬ 
straints, the MotherBoards were designed to read out up 
to 24 EEE cards, and therefore 248 MotherBoards were 
used in the complete system. 

Due to the extended size of the full experimental 
setup, event data need to be transmitted to large dis¬ 
tances, about 50 m, from the detectors to the location 
of the Central DAQ in the control room. This means a 
high risk for accidental introduction of ground loops in 
the system, and therefore galvanically decoupled trans¬ 
mission lines were needed. Our choice fell to the rela¬ 
tively cheap and well understood large bandwidth DDL 
(Detector Data Link) system inoiin] for long range op¬ 
tical data transfer, also used in the ALICE experiment 
working at the LHC accelerator at CERN. 

In order to minimize the number of optical links to¬ 
ward Central DAQ, an intermediate serialization stage 
was needed in between the MotherBoards and the Cen¬ 
tral DAQ. These units were called the Concentrator- 
Boxes and were designed to serialize data of up to 32 
MotherBoards onto a DDL line. Short distance data 
transfer between the MotherBoards and the Concentra- 
torBoxes used relatively inexpensive LVDS (Low Volt¬ 
age Differential Signal) connections. These connec¬ 
tions, being differential, are noise tolerant and although 
they do not provide galvanic isolation, can work with up 
to +1 V common mode mismatch between the transmit¬ 
ter and the receiver side. 

3. The TPC readout electronics 

Motivated by the requirements discussed in Sec. 
the NA61 TPC readout system fill has four main com¬ 
ponents as shown in Lig.|^ 

The main parts of the assembly are; 

1. Lront-End Electronics ||8l; these cards perform the 
analog sampling of TPC pad charges, the ana¬ 
log storage of these and subsequent digitaliza¬ 
tion. Upon trigger arrival, one PEE card sam¬ 
ples the preamplified and shaped pad charges over 
51.2 yusec time duration in 256 or 512 timeslices us¬ 
ing an SC A (Switched Capacitor Array) 13. The 
pertinent sampling clock is derived from a cen¬ 
trally generated 25 MHz oscillator in order to avoid 
phase ambiguity. Poliowing the sampling period 
the, stored charges of each timeslice are digitized 
to 9 bits using Wilkinson ADCs. One PEE can han¬ 
dle 32 TPC detection pads. Due to historical rea¬ 
sons, each PEE is built up of two equivalent halves 
each handling 16 pads. The PEE cards are con¬ 
trolled and read out via a 50 line flat cable by a 
MotherBoard. 



Figure 2: Block diagram of the NA61 TPC readout system. Up to 24 
FEE cai'ds are accessed by a processing MotherBoard using flat ca¬ 
ble connection, then up to 32 MotherBoards transmit their data to a 
ConcentratorBox using LVDS connection, finally up to 10 Concentra- 
toi'Boxes forward the data to the Central DAQ PC using DDL optical 
line. 


2. MotherBoards lEl: these organize the readout of 
the PEE cards and build a subevent data structure. 
They also perform pedestal subtraction, noise sup¬ 
pression, bit truncation and zero compressiorj^ A 
MotherBoard has 24 input ports which are capable 
of reading out PEEs in parallel. Por the 24 PEE 
cards two different cabling topologies are possible: 
one PEE card on each port (24x1 cabling), and 2 
PEE cards connected in daisy chain by a flat ca¬ 
ble to 12 ports of a MotherBoard (12x2 cabling). 
After reading out the PEE cards, the ADC val¬ 
ues are pedestal subtracted, and any residual values 
smaller than a typical threshold of 6 ADC (~2 cr of 
the noise level) are substituted with zero values in 
a noise suppression process. These pedestal sub¬ 
tracted and noise suppressed ADC values are then 
truncated to 8 bits, are ordered according to natu¬ 
ral detector geometry, and are subsequently serial¬ 
ized into a data stream. This data stream is then 
zero compressed, i.e. in case of zero ADCs only 
one zero and the number of consecutive zeros (up 
to 255) are transmitted in the stream. The serial¬ 
ized data of the PEEs are then sent by the Moth¬ 
erBoard onto an 50 Mbit/sec effective bandwidth 
LVDS connection, its transmission medium be¬ 
ing a maximum 15 m long STP (Shielded Twisted 
Pair) cable. The LVDS connection reaches the next 
stage, the ConcentratorBoxes. 


possibility for using Huffman compression is also foreseen, but 
it is not yet implemented, as the compression rate of simple zero com¬ 
pression proved to be sufficient for present data taking settings. 
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3. ConcentratorBoxes d: these are standalone se¬ 
rializer/deserializer boxes with 32 bidirectional 
LVDS inputs/outputs. A maximum of 32 Mother- 
Boards can be connected to one ConcentratorBox. 
During the readout, the ConcentratorBox serial¬ 
izes the incoming data streams onto a single 32 bit 
wide data stream. Then the 32 bit data is transmit¬ 
ted to the DAQ server PC through DDL connec¬ 
tion. In the other directions, the ConcentratorBox 
receives commands, pedestal table and other data 
from the DAQ PC addressed to the MotherBoards. 
The ConcentratorBox routes these data to the se¬ 
lected MotherBoard. 

4. DDL ifTOlfTTl: the DDL links are high-speed, se¬ 
rial optical links with parallel interfaces developed 
for the ALICE experiment. A link consists of 
a SIU (Source Interface Unit) card, a PCI-X or 
PCI-Express RORC (Readout Receiver Card) and 
a maximum 200 m length duplex optical cable with 
LC connectors. The DDL transfers the acquired 
detector data, data blocks or status info to the Cen¬ 
tral DAQ computer, and using a backward chan¬ 
nel can load data blocks or commands from the 
Central DAQ system to the lower level systems. 
The bandwidth is guaranteed to be a minimum of 
200MByte/sec in both directions. 


of the EEE cards to the MotherBoard, as in the latter 
case the two EEE card on the same flat cable is read out 
in an alternating order, causing a slight overhead. The 
maximum possible speed using 256 timeslices is around 
90 events per second, limitation due to the EEE cards 
architecture. The control signals necessary for the EEE 
readout are all produced by the MotherBoard. 




The MotherBoard has four different working modes: 
command processing mode, data block uploading or 
downloading mode, data collecting mode, pedestal col¬ 
lecting mode. In case of data collecting or pedestal 
collecting mode, upon receiving a “Start Data Taking” 
command the MotherBoard waits for the trigger sig¬ 
nals. The workflow of the data acquisition is organized 
as follows. A common external clock (TPC CLK) is 
distributed to the MotherBoards in order to synchronize 
the readout system. The acquisition starts when a com¬ 
mon external trigger is received. The TPC CLK along 
with the trigger are distributed via ECL lines. Receiv¬ 
ing the trigger the MotherBoards force the EEE cards to 
start sampling the analog charges deposited on the TPC 
pads after amplification and shaping. Upon finishing, 
the MotherBoards issue digitization command and read 
out the raw amplitude ADC data produced on the PEE 
cards. The digitalization and ADC readout are made in 
timeslice order (Pig.|^: 

1. digitalization of 32 channels in parallel, 

2. readout of the 32 channels in channel order for all 
the PEE-s simultaneously, 

3. the above steps are repeated for each timeslice. 

The readout time slightly depends on which of the 24x1 
or 12x2 cabling topologies are used for the connection 


Figure 3: Time diagram of readout process of an FEE card in 24x1 
(top panel) and 12x2 (bottom panel) cabling mode. The readout is 
done in the timeslice order, i.e. for each timeslice the charges on the 
1x32 channels of 1 FEE card or 2x32 channels of 2 FEE card is digi¬ 
tized, and then they are read out in channel order. The magnified time 
regions are indicated by the dashed lines. 

The schematic block diagram of a MotherBoard hard¬ 
ware architecture, based on PPGA technology for main¬ 
taining good development flexibility, is seen in Pig. 
Each MotherBoard is equipped with 24 flat cable con¬ 
nectors for the 24 PEE cards. The PEE cards are read 
out in parallel, and their data are temporarily stored in 
the PPGA. The MotherBoard automatically multiplexes 
the 24 input data stream in a single serial output data 
stream. Before sending the data, the MotherBoard ar¬ 
ranges them in padrow order and performs zero com¬ 
pression, reducing the amount of data. Por the output 
data, sent on the LVDS lines to the ConcentratorBox, 
a simple, reliable bidirectional link protocol has been 
developed. The pedestal table and the noise suppres¬ 
sion threshold can be downloaded to the MotherBoard 
through the same serial link. Some parameters impor¬ 
tant for the readout, such as the TPC and MotherBoard 
location identifiers, or the cabling topology of 24x1 or 
12x2 can be preset using jumpers. 
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Figure 4: Schematic block diagram of a MotherBoard hardware ar¬ 
chitecture. 


The simplified firmware block diagram of the Moth¬ 
erBoard firmware is shown in Fig. 

• The “FEE Write and Read-Out Controller” unit 
controls the read out of the EEE cards and subtracts 
the pedestal values. 

• The “Pedestal Write and Read-Out Controller” 
module has several functions. During data taking 
it reads the pedestal value from the external SRAM 
memory belonging to the current data. In pedestal 
collecting mode it saves the value in the external 
SRAM and transmits all pedestals toward the Con- 
centratorBox when the collecting has finished. Us¬ 
ing the received pedestal, i.e. zero-signal events, 
the DAQ computer calculates the table of average 
pedestals and sends it back to the MotherBoard, 
saving it in its external SRAM memory. This mod¬ 
ule has an additional function which is activated in 
the rare cases when the output EIEO of the Mother- 
Board becomes full. In that case the incoming data 
are saved in the upper half of the external SRAM 
and sent to the “Zero Compression” unit only at the 
end of data reading. 

• The “Pad-Row Builder” unit sorts the data accord¬ 
ing to their padrow number. 

• The “Zero Compression” module compresses the 
incoming data and writes them into the output 
EIEO. 

• The “Command reply. Event Builder and Seri- 
aliser” unit continuously takes the data from the 
output EIEO and converts them into serial data. 
Additionally, this module sends status information. 
Also this module is responsible for the uploading 
of data blocks, such as pedestals from the exter¬ 
nal SRAM or the firmware file from configuration 


EEPROM toward the ConcentratorBox, whenever 
verification reading is requested. In order to distin¬ 
guish normal data bytes and status data bytes, the 
module frames the 8 bit wide data stream to 11 bits, 
the leading 2 bit encoding the type of the data byte, 
whereas the trailing 1 bit being an end-of-frame. 
In order to reach the 50 Mbit/sec effective transfer 
rate for the data bits, the LVDS lines were driven 
at a 11/8 times larger frequency, i.e. at 68.75 MHz 
because of the overhead of the frame bits. 

• The “Command and Deserialiser” unit parallelizes 
the incoming serial data. It decodes the received 
commands from the ConcentratorBox and decides 
what to send as reply: status information, the 
firmware file or a pedestal table data block. 



Figure 5: Schematic block diagram of the MotherBoard firmware. 

The data stream of a MotherBoard is produced as 
follows. When the MotherBoards reads the 9 bit ADC 
charge of a timeslice of a pad, the previously stored cor¬ 
responding pedestal value is subtracted, and this data 
word is replaced by zero value in case it is below the 
predefined noise threshold. After this, the incoming 
9 bit data are truncated to 8 bit: if an ADC value is larger 
than 255, the value 255 is returned instead. After the 
above processing the data are transferred to a EIEO for 
each EEE card. Erom this EIEO these are read out in 
padrow order and sent to the zero compressor unit. Af¬ 
ter performing the zero compression the resulting 8 bit 
wide data stream is written to the output EIEO as shown 
in Eig.|^ A 1 word (4 Byte) event header containing the 
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status of the jumpers and the trigger counter precedes 
the data stream. At the end of the data stream the Moth- 
erBoard can send a 4 word (16 Byte) event trailer con¬ 
taining the FEE and MotherBoard voltage values, plus 
an error and status word to the ConcentratorBox (see 
Fig- 0- These five status words are optional, but are 
used by default. When, during data taking, the Concen¬ 
tratorBox for any reason cannot accept data, the Con¬ 
centratorBox sends an XOFE control word to the Moth¬ 
erBoard. Receiving XOFE the MotherBoard stops send¬ 
ing data to the Concentrator. The MotherBoard tem¬ 
porarily stores the data in its output EIEO till the arriv¬ 
ing of an XON control word. If during this time the 
output EIFO becomes full, the MotherBoard can follow 
one of the two, configurable policies; 

1. The MotherBoard continues to read out the EEE 
cards without loading data into the output EIEO. 
In this case the event is truncated, which fact is 
signaled in the error and status word at the end of 
the event. 

2. The MotherBoard, after noise suppression and 
padrow ordering, sends the data temporary into the 
higher part of external SRAM memory. Only the 
higher part can be used for that purpose as the 
lower part contains the pedestal table. At the mo¬ 
ment when free space becomes available in the out¬ 
put EIEO the data are transferred to the zero com¬ 
pressing module. This event of temporary storage 
is also logged in a warning bit of the error and sta¬ 
tus word. 

Normally, only the latter mode is used, the former mode 
is applied only in case of very rare calibration runs in 
the 512 timebin mode. The advantage of the second 
policy is that the event is transferred as a whole. The 
above sequence of reading out - pedestal subtraction - 
noise suppression - padrow ordering - zero compression 
- sending to ConcentratorBox is repeated until receiving 
a “Stop Data Taking” command from the Concentrator- 
Box, which signals the end of a data taking run for the 
MotherBoard. 

The ConcentratorBox receives data packets from the 
MotherBoards via 32 LVDS channels and forwards their 
serialized data onto a DDL line, as shown in Fig.[^ The 
firmware of a ConcentratorBox (Fig. has two main 
parts: the serializer/deserializer for each channel and 
the Data Formatter, which creates data structure to be 
fed to the DDL. For each channel one receiver and one 
transmitter logic is implemented. The receiver logic rec¬ 
ognizes the start and end of event frame, performs serial 
to parallel data conversion and buffers the data in the re¬ 
ceiver FIFO. The transmitter logic reads data from the 



Figure 6: Schematics of the data flow inside a MotherBoard. 
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Figure 7: Schematic description of the data structure on a Mother- 
Board output. 


transmitter FIFO, performs parallel to serial conversion, 
and attaches start of frame and end of frame symbols. 
Both logics ensure the flow control between the Moth¬ 
erBoard and the ConcentratorBox. The input packets 
are buffered in FIFOs. The readout of the FIFOs is or¬ 
ganized in cycles. In each cycle the firmware reads max¬ 
imum 128 pieces of 4 Byte data words from each chan¬ 
nel’s receiver FIFO in a sequential way. If the FIFO of a 
channel is empty then the given board is skipped. A so- 
called mini-header is attached to each data packet con- 
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taining the identification number of the source Mother- 
Board and the length of the block. These packets are 
called “LVDS trains”. The readout cycle continues till 
the end-of-frame signal is reached in all channels. Be¬ 
fore sending the first mini-header an event header is sent 
to the DAQ. The structure of the data stream transferred 
to the DDL link is showed in Fig. 10 Upon the arrival of 
the first data to the ConcentratorBox, a TTL output line 
is set to logical one level (Busy signal), and this sig¬ 
nal persists until the ConcentratorBox recognizes that 
all data from all of its MotherBoards are drained out. 
The Busy signal is used for trigger level synchroniza¬ 
tion of the subdetectors. 
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LVDS Connector 32 I 


t 


3LJC 


LVDS 

Transmitter 




LVDS 

Receiver 


T 


TlH 


DDL Source 
Interface Unit 


Configuration 


I 



Cyclone II 
FPGA 


Hiiim 



System 


Clock 


Configuration 


Busy 

Connector 


Connector 


Figure 8: Block diagram of a ConcentratorBox. 


The DDL transfers data from ConcentratorBox to 
RORC plugged in one of PCl-X or PCI-Express slots 
of the DAQ computer. The DDL works using a push- 
data mode protocol, i.e. after starting data collection 
the ConcentratorBox sends the data to the RORC when 
they are produced, without waiting for any request. The 
RORC’s task is to transfer the data into DAQ PC’s mem¬ 
ory via the PCI bus. To fulfill this task the RORC has 
a FIFO up to 128 entries which can be filled with PCI 
addresses and lengths of memory blocks where the data 
blocks can be written to. When a data block arrives the 
RORC takes the next address in the FIFO and writes 
the block there using DMA (Direct Memory Access), 
without processor intervention. For the DMA, it is nec¬ 
essary to provide continuous physical memory pages. 
To assure this, we use a special memory tool; the Phys- 
mem package, which gives access to a fixed part of the 
physical memory of a Linux machine for unpaged, un¬ 



Figure 9: Block diagram of ConcentratorBox firmware. 


swapped physical I/O. This memory is outside of the 
memory assigned for the operating system. However 
it can be reached by both the kernel and the user pro¬ 
cesses via the Physmem driver. The package’s driver is 
loaded as a kernel module at boot time, and care must 
be taken to limit the memory used by the kernel using 
boot arguments in order to allow space for the Physmem 
above Linux memory. The driver provides the user with 
the physical address and size of the continuous memory 
which can be used as the destination area of read out 
data. 

During one DMA, only one data page can be written. 
Data pages that belong to the same event are transferred 
over the DDL in one or more DDL blocks. Each block 
can be up to 4 x (2'^ - 1) = 2097148 bytes. When a 
data block is transferred into the Physmem memory the 
RORC puts two control words to a predefined area of 
memory where it specifies the length of the block and 
signals if the block is the last one of an event. 

For RORC functionality the RORC’s driver module 
has to be loaded. This driver produces the mapped 
user addresses of the RORC registers. The pertinent ad¬ 
dresses are used for accessing the RORC registers and 
address FIFO. Using these registers the DAQ program 
initializes the ConcentratorBoxes and MotherBoards, 
can send pedestal data, while using the address FIFO 
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Figure 10: Data stmcture at the output of a ConcentratorBox. 


it can follow the course of the data acquisition. 

Presently, the TPC Data AcQuisition is steered by 
the Central DAQ software, and is ran on a single Cen¬ 
tral DAQ PC with the following parameters: X7DB8-X 
motherboard, 64 bit, total 8 cores of Intel Xeon CPU 
at 2 GHz, 8 GByte memory, 6 PCI-X card slots, 4 USB 
ports, 1 serial port. The computer runs with a 64 bit 
SLC6 Linux operating system. The memory available 
for the Linux kernel is restricted to 4 GByte using a ker¬ 
nel argument. The remaining, upper 4 GByte memory is 
given to the Physmem kernel module, for receiving the 
subevent data. The built event data are written to a fast 
temporary disk buffer, consisting of a RAIDO array of 
5 pieces of 2 TByte hard disks, providing about 5 times 
larger bandwidth for writing than a single hard disk. 

The data taking cycle of the full TPC system 
causes approximately 12 ms deadtime, determined by 
the present setting of TPC FEE cards. 

4. Further subdetectors 

A number of further subdetectors (PSD, beamline de¬ 
tectors, ToE detector) also needs to be read out along 
with the tracking data of the TPC system described in 


Sec. These, however, only produce small volume 
data, merely 230 kByte without any compression. Also, 
their deadtime is smaller than that of the TPC system. 
The organization of the readout of these subsystems are 
summarized in the following subsections. 

4.1. The readout electronics of the PSD calorimeter 

The PSD calorimeter lfT3l is a very important com¬ 
ponent of the NA61 experiment for the nucleus-nucleus 
collision data taking periods. This detector is designed 
for the determination of collision centrality by mea¬ 
suring the energy of the spectator nucleons, i.e. of 
beam nucleons not participating in the collision. Its 
design parameters were set such that its energy reso¬ 
lution allows centrality determination with a precision 
of +1 spectator nucleon at the lowest used beam en¬ 
ergy. The calorimeter consists of 44 modules cover¬ 
ing a 120xl20cm^ transverse area, and the longitudi¬ 
nal depth covering 5.7 nuclear interaction lengths. The 
modules of the calorimeter consist of a sequence of lead 
and plastic scintillator plates, which are grouped into 10 
longitudinal sections for shower prohle monitoring, the 
light yield of each section being read out by an MAPD 
(Micro-pixel Avalanche PhotoDiode). The MAPD sig¬ 
nals are stored in time samples and are read out by PSD 
MotherBoards. These were designed to have similar in¬ 
terface towards the Central DAQ as the TPC Mother- 
Boards, connected to the system via the Concentrator- 
Boxes. The raw data volume of the PSD subevent is 
200 kByte, and its detector readout has a deadtime of 
2.5 ms. 

4.2. The readout electronics of the beamline detectors 

The beamline detectors, such as the beam scintilla¬ 
tors, Cerenkov counters and beam position detectors 
are read out using legacy commercial CAMAC ADC, 
TDC and scaler modules, altogether ~300 channels. 
A CAMAC-to-VME bridge (CES VIC8251) provides 
connection to a dedicated VME crate containing a VME 
controller (CES FIC8234), which runs simple low level 
readout software. Since the FIC8234 can receive an ex¬ 
ternal trigger interrupt upon the detection of a physical 
collision, this feature is used for notihcation of the low 
level readout software to move the event data via the 
CAMAC-to-VME bridge into a VME memory module 
(MicroMemoryInc MM6390), in which these subevents 
are stored in a ring buffer. This low level DAQ sys¬ 
tem is self-contained and can run independently from 
Central DAQ. The Central DAQ PC may access the per¬ 
tinent MM6390 VME memory module via a VME-to- 
USB bridge (CAEN V1718) in order to poll for new 
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subevents. Whenever a new subevent arrives, the Cen¬ 
tral DAQ PC copies it into its own ring buffer. A data 
flow blocking mechanism is implemented via a VME 
commandable register unit (CES RCB8047 CORBO). 
The Central DAQ can use it to pause the event flow 
recorded by the low level system in the rare cases of 
transient bandwidth problems with the moving of the 
subevent data. The data volume of this subevent is 
merely 6.5kByte. The deadtime of this subdetector 
readout is 11 ms. 

4.3. The readout electronics of the ToF system 

A quite similar setup is used to read out the 2080 TDC 
(Time-to-Digital Converter) and QDC (Charge Analog- 
to-Digital Converter) channels of the ToE system. The 
legacy Eastbus based TDC and QDC units are linked 
using a Eastbus-to-VSB bridge (CES LDA9212) to a 
dedicated VME/VSB crate steered by a VME/VSB con¬ 
troller (CES EIC8234), running a low level readout 
software. The Eastbus TDC and QDC units are pre¬ 
triggered using the 30 ps time resolution start scintilla¬ 
tor signal, Sl_l, of the experiment, and the pre-trigger 
may or may not be confirmed by a 8 ns time jitter Main 
Trigger signal generated by the trigger logic using fur¬ 
ther beam counters within a timeout of 300 ns. If the 
Main Trigger was not present, the TDC and QDC data 
are cleared using a East Clear signal causing only a 
1 yusec deadtime. Whenever the Main Trigger signal 
confirmed the pre-trigger, the low level readout software 
on the EIC8234 controller is notified via its external in¬ 
terrupt port. The TDC and QDC data are moved via 
the Eastbus-to-VSB bridge into the internal memory of 
EIC8234, and these subevents are stored in a ring buffer 
scheme. The Central DAQ PC polls this ring buffer for 
new events and moves them away via a VME-to-USB 
bridge (CAEN V1718) and propagates them to its own 
ring buffer in memory. The data flow control to the Cen¬ 
tral DAQ is again realized via a VME configurable reg¬ 
ister unit (CES RCB8047 CORBO) in order to prevent 
the low level ring buffer from overwriting itself before 
the moving of the subevents by the Central DAQ takes 
place. The data volume of this subdetector is 21 kByte, 
and its deadtime is 8 ms. 

An overview of the organization of the entire NA61 
readout system including the legacy beam detector and 
ToE readout is outlined in Eig. 0 The legacy EEEs of 
the ToE detectors as well as the beamline detectors are 
foreseen to be upgraded using the new DRS technol¬ 
ogy m in the near future, and their data transmission 
scheme is foreseen to be adapted to that of the TPC sys¬ 
tem discussed in Sec. [phased on DDL IfTOll technology. 



Figure 11: Overview of the entire data acquisition system of NA61 
along with the new TPC readout system, PSD readout system and the 
legacy beam detectors and ToF system. 


5. Signal and data flow 


As mentioned previously, in the full experimental 
setup the data taking is controlled via a set of trigger 
signals from beam counters and Busy signals from sub¬ 
detectors. These communication lines, along with the 
actual data flow lines within the experimental setup are 


summarized in Eig. 12 The Sl_l signal serves as a tim¬ 
ing reference and a pre-trigger for the detector complex. 
The signal of the V1 veto counter, used for filtering out 
already scattered beam particles just upstream of the 
target, is applied as a pre-trigger-veto. The rest of the 
counter signals (S2, VO, VI’, CEDAR, Z Detector, S4, 
S5) have somewhat worse time resolution and are used 
by the Trigger Logic coincidence unit to build the final 
Main Trigger signal, confirming the pre-trigger. If all 
the subsystems are in a state that is ready for data tak¬ 
ing, the Busy Logic lets the Main Trigger pass, and thus 
generating the Main Trigger Accepted signal in order to 
start actual reading out of the detector data. The path¬ 
ways of the data flow from the subdetectors, such as the 
TPC, the PSD, the ToE and beam counter systems are 
also indicated. 


6. Data synchronization 

The synchronization of the deadtime of the subdetec¬ 
tors for each event is done by a Busy Logic system. This 
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Figure 12: Summary figure showing the signal and data flow lines within the NA61/SHINE experimental setup. The work cycle is started by 
the signal of the best time resolution photomultiplier of the SI counter (Sl_l) as a pre-trigger, and may be vetoed by the signal of VI as a pre- 
trigger-veto in order to filter out deflected beam particles. The rest of the beam counters are used by the Trigger Logic to build a Main Trigger 
signal confirming the pre-trigger. In case all the subsystems are in ready state, the Busy Logic lets the Main Trigger pass, triggering the actual data 
readout. The data flow lines from the subdetectors are also indicated. 


vetoes the trigger signal distributed to all the subsystems 
in case any of the subdetectors are not in their ready 
states, which is determined via a Busy signal from each 
subdetector. In addition, the trigger is also blocked if 
any of the fast systems (beam detectors, or ToF system) 
received a pre-trigger, i.e. an Sl_l signal not vetoed by 
VI. 

As the NA61 DAQ system is organized in a push-data 
mode, i.e. the low level systems receive the data tak¬ 
ing trigger independently and they push their data asyn¬ 
chronously to a higher level, some kind of protection 
mechanism is needed for checking against any mixing 
of subevents in the parallel data streams. Initial syn¬ 
chronization of the readout systems of all the subdetec¬ 
tors occurs at the start of a data taking run, when all the 


subdetector trigger counters are reset. In order to main¬ 
tain this synchronization, we use the fact that the SPS 
beam is provided to NA61 experiment in ~ 10 sec long 
“spills” with at least 5 sec break in between them (in the 
most usual data taking conditions the duty cycle is about 
30%). During the spill breaks, all the data channels to¬ 
ward the Central DAQ are emptied. Upon detecting a 
stop of event flow for 3 seconds, the trigger counters 
in the subevent data channels are cross-checked as they 
should be equal if no false trigger has been received by 
any subdetector and if no subdetector failed to receive 
a true trigger. In case the trigger counters are equal, 
the corresponding identifier of the last recorded event is 
marked as the last good one, and is approved for archiv¬ 
ing. In the rare cases when this check does not pass. 
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the data taking run is halted, the event stream is deleted 
back to the last good event, the Central DAQ issues an 
error message of “trigger skew”, and a completely new 
data taking run is started with all the subdetectors fully 
reset. Such a recovery run stop-and-start takes about 1 
minute. 

A further crash scenario can occur when the finaliza¬ 
tion of the subevent by one of the subsystems is unfin¬ 
ished, i.e. the end-of-subevent trailer in a given channel 
does not arrive. In such case, the Central DAQ issues 
an “event reading timeout” within 1 second and a com¬ 
pletely new run is started with subdetectors reset. 

A very rare but very harmful crash scenario is when 
a subsystem starts data taking e.g. due to a false trigger, 
but is not able to send the subevent header to the Cen¬ 
tral DAQ due to some internal inconsistency. In such 
case the Central DAQ cannot know that the recording of 
a new event has been started and therefore cannot time 
out. However, since the data taking cycle has begun, the 
given subsystem sets its busy signal thus blocking the 
arrival of any further trigger signals. In such scenario 
the data taking would hang completely without any par¬ 
ticular error message until human intervention could de¬ 
tect and resolve the problem. Therefore, a custom made 
TTL register module was built in order to monitor the 
busy signals of all the subdetectors. The pertinent reg¬ 
ister can read 16 TTL signals which are visible to the 
Central DAQ PC via its RS232 serial port. Whenever 
the subevent flow comes to a standstill the busy register 
is checked that none of the busy signals remained in the 
“true” state. In case any of the busy signals remain in 
“true” 1 second after the halt of the event flow, a “a busy 
signal got stuck” error is issued by the Central DAQ and 
a completely new run is started with subdetectors reset. 
In order to maintain galvanic isolation of the Central 
DAQ PC, an optical RS232 coupler is used. 

The occurance rate of any kind of data inconsistency 
condition in the data stream during stable data taking 
is seen to be about 0 to about 5 times a day, which is 
considered to be well tolerable. These failures are nor¬ 
mally traced back to some infrastructural problem such 
as power supply or cooling water service instability. 

7. The readout/online software 

The structure of the Central DAQ software, called 
Runcontrol consists of the following 9 subprocesses. 

1. GUI. the GUI of the software is a Tcl/Tk Qa 
wrapper script, providing a user friendly interface. 

2. Runcontrol: the core software under the GUI is 
written in C language, running autonomously. Its 


user interface is a simple shell-like command line 
interpreter based on GNU Readline Library ESI. 
For experts, it is possible to use this shell-like com¬ 
mand line environment for debugging mode, how¬ 
ever, normally it is handled by the GUI, emulating 
user commands which are sent and received be¬ 
tween the command line interpreter and the GUI 
via Unix pipes and log files. 

3. Event Server: whenever the core of Runcontrol is 
started, an event server process is created. This is 
a fork-server and can serve up to 16 monitoring 
clients for online event display. 

4. On top of the above, the following processes are 
launched upon start of a data taking run: 

(a) Logger: this is logger and consistency check¬ 
ing process. It is also responsible for check¬ 
ing the trigger counter synchronization of all 
subsystems whenever there is a pause in the 
data stream. In case the trigger counters on 
the incoming DDL and VME channels are 
inconsistent, a “trigger skew” error is issued 
and a completely new run is started with sub¬ 
detectors reset. Also, the busy signals of the 
subsystems are monitored by this process us¬ 
ing our custom made register unit accessible 
via the serial port. 

(b) Trigger Client: this is a process for commu¬ 
nication with the Trigger System. This client 
retrieves scaler counts and other trigger sum¬ 
mary information at the end of each spill, and 
writes it into the data stream for monitoring 
purposes. 

(c) DCS Client: this is a process for communi¬ 
cation with the DCS (Detector Control Sys¬ 
tem), responsible for monitoring the slowly 
changing detector parameters, such as tem¬ 
peratures, high voltages etc. This process pe¬ 
riodically retrieves logs from the DCS system 
and appends to the event data stream for mon¬ 
itoring purposes. 

(d) Recorder: this process is responsible for col¬ 
lecting the subevents on a first come first 
served basis from the data channels of the re¬ 
ceiver processes. It collects the subevent data 
addresses in an event data pool for each in¬ 
coming subdetector data channel. If the end- 
of-event trailer arrives in each data channel, 
it builds the event directly onto disk follow¬ 
ing an internal consistency check. Upon re¬ 
quest, it also duplicates in memory the event 
being built for the Event Server process. The 
Recorder process has two further processes 
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Runcontrol GUI 
(Front-end) 


subordinate to it for efficient parallelization 
of the data receiving procedure. 

i. DDL Receiver, this process waits for 
subevent data delivered by the RORC 
cards and forwards their address on 
a first come first served basis to the 
Recorder process. The RORC cards de¬ 
liver the event data to the pre-allocated 
Physmem area without CPU interven¬ 
tion. The addresses of the avail¬ 
able Physmem pages, returned by the 
Recorder, are filled by this process to the 
RORC cards internal FIFO. In the patho¬ 
logical case when the Recorder cannot 
build the events fast enough, the RORC 
cards may run out of free Physmem page 
addresses. In that case, the RORC cards 
stall the data stream via issuing an XOFF 
signal towards the DDL lines. The data 
stream is restarted by XON, when ad¬ 
dresses of new free Physmem pages ar¬ 
rived into the internal FIFO of the given 
RORC card. 

ii. VME Receiver, this process polls for 
subevent data in the VME memory of 
the low level systems, accessed via 
the CAEN V1718 VME-to-USB bridge 
units. Whenever a new subevent is de¬ 
tected in any of the subsystems, it is 
copied to the Physmem area allocated 
for this process, is stored in a ring buffer, 
and the subevent address is passed to 
the Recorder process. In the pathologi¬ 
cal case when the Recorder cannot drain 
the subevents fast enough, the following 
event triggers are blocked by command¬ 
ing our VME controllable TTL register 
units (CES RCB8047 CORBO) to issue 
a busy signal. 

The non data intensive inter-process communication of 
the 8 subprocesses of the core of the Runcontrol soft¬ 
ware is solved using standard Linux shared memory. 
Eor the data intensive inter-process buffering, the Phys¬ 
mem area is used instead, in order to avoid unneces¬ 
sary overhead by the Linux kernel. The data flow of the 
Central DAQ software during data taking is illustrated 
inPig.pj] 

The DAQ software produces its final output data 
stream as a sequence of 1 GByte files, called chunks, 
for easier handling. Whenever the currently being writ¬ 
ten chunk exceeds 1 GByte, the corresponding chunk is 


Runcontrol CLI 
(Back-end) 




>1 

CASTOR 

CBESV 


(The CERN Advanced STORage manager) 


Figure 13: Sketch of the data flow of the Central DAQ software, the 
Runcontrol, during data taking run. The flgure also depicts the process 
hierarchy, consisting of 9 individual subprocesses. The DDL Receiver 
and VME Receiver lay the anived data into Physmem pages, and these 
page addresses are passed to Recorder for event building. After the 
Recorder has built the events onto disk, it the recycles addresses of 
the free Physmem pages to DDL Receiver and VME Receiver. 


closed and the following event is written to a new chunk. 
The chunks are written to the RAIDO buffer disk in or¬ 
der to reach the necessary bandwidth for writing (the 
writing bandwidth to a RAIDO disk is larger than to a 
single disk). 

The DAQ software is supplemented with a set of 
failure-tolerant system of scripts, responsible for final 
treatment of the chunks. 

1. Online QA (Quality Assessment) of the newly ar¬ 
rived raw data chunks via a dedicated online QA 
software for fast integrity checks and for acquiring 
histograms of important basic quantities for mon¬ 
itoring and sending them for online display. That 
piece of software is using the standard NA61 of¬ 
fline software framework, called Shine Hm. Sub¬ 
sequently, the properties of the produced control 
histograms are also automatically verified by a 
simpler program (“Electronic Shifter”). In this 
way, an alarm can issued whenever e.g. a well- 
defined detector failure occurs. The QA procedure 
is performed via two independent processes for ef¬ 
ficient parallelization. 

2. Copying of the certified raw data chunks to the 
CASTOR archiving system IfT^ of CERN. This is 
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performed via two independent processes for effi¬ 
cient parallelization. 

3. Verification of the raw data chunks on tape. The 
CASTOR system provides the adler32 checksum 
and size of the chunks migrated to tape. These 
two verification parameters are cross-checked with 
that of the chunk on the Central DAQ PC’s RAIDO 
buffer disk by a script. The local copy on the Cen¬ 
tral DAQ PC is deleted whenever the verification 
passes. In the rare cases when verification fails, 
the raw data chunk is kept on the Central DAQ PC 
for human intervention. 

8. Raw data structure 

Book-keeping of the DDL page structure along with 
the “LVDS train” substructure of the data stream is per¬ 
formed by the DDL Receiver process. These are un¬ 
folded from the data stream at the event building level 
upon writing to disk, in order to have contiguous bi¬ 
nary data from each MotherBoard, free from all the data 
frames used only for communication protocols between 
MotherBoard and ConcentratorBox, or Concentrator- 
Box and Central DAQ. The subevent data from the beam 
detector system or the ToF system are already built by 
the low level FIC controllers, and are simply received 
by the VME Receiver process and passed over for event 
building without modification. 

The raw data structure written by the Central DAQ is 
organized as follows. Each event is written on disk af¬ 
ter each other as a Record, consisting of a sequence of 
4 Byte integer words, the first word encoding the num¬ 
ber of words in the Record (inclusive), followed by the 
data words of the Record. The data words are organized 
in BOS (Bank Objective System) banks, consisting of 
a 4 word header and a sequence of data words within 
the BOS Bank. The first BOS header word is the iden¬ 
tifier of the type of subdetector data, the second BOS 
header word is a geographical detector identifier, the 
third BOS header word is reserved (set to zero), and the 
fourth BOS header word encodes the number of follow¬ 
ing data words (exclusive) within the BOS Bank. The 
4 Byte words in the event data stream, whether inter¬ 
preted as integers or floating point numbers, are always 
encoded in the “network byte order”, also called “Big- 
Endian format”. 

The most important BOS Banks are the followings: 
EVEH (event header data), CAMC (beam detector 
data), TOED (ToE data), MBRD (TPC MotherBoard 
data), MBLV (TPC MotherBoard housekeeping param¬ 
eters, i.e. low voltages, temperatures), PSDD (the same 


as MBRD for the PSD detector), PSLV (the same as 
MBLV for PSD detector). 

The used raw data structure based on Record and 
BOS Banks proved to be a flexible way for extension 
of the detector system or for deprecation of old subde¬ 
tectors. 

The size of a “black event” (event without pedestal 
subtraction, noise suppression and zero compression) is 
100 MByte if the TPC drift time is sampled in 512 time- 
bins, whereas 50 MByte in case of the typically used 
256 timebins. Eor a normal event (event with pedestal 
subtraction, noise suppression and zero compression), 
the typical size is 1.5-5 MByte, slightly depending on 
track the multiplicity in the event. 

9. Performance 

The presented readout and Data Acquisition system 
was set first in operation in 2009, with the start of the 
data taking program of NA61/SHINE. After initial de¬ 
bugging, it served as a very reliable system throughout 
the extended data taking periods of 2-5 months per year, 
and so far only minor upgrades were needed. The total 
deadtime, 12 ms, of the system proved to be sufficient to 
complete the experimental program in a timely manner. 
Improvement on the deadtime would not provide fur¬ 
ther significant benefits as presently our data taking rate 
is limited by the available beam intensity for most of the 
collision types as well as the limited target thickness in 
order to suppress secondary collisions. Therefore, the 
construction is considered quite mature and stable. The 
acquisition system was used to record a large amount of 
physics data on 29 kinds of reactions, with the number 
of events ranging from 1 to 40 M, totaling up to 194M 
recorded physics events during the years 2009-2015. 
It is expected that a further 20 kinds of reactions will 
be recorded through 2015-2019, typically comprised of 
4 M events each. 

10. Concluding remarks 

The design concepts and the used technologies of the 
readout system of the NA61/SHINE experiment at the 
CERN SPS were summarized. Special emphasis was 
given to the readout of the large, 40 m^ TPC system; the 
most important detector component in the complex. The 
readout technology is largely based on the DDL data 
transmission, also used in ALICE. The present system 
provides possibility for reading out physics events at a 
rate of 83 Hz, which satisfies the present needs given 
the upper safety limit of usable beam intensity. The 
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raw event size is 50 MBytes, which after compression 
reduces to 1.5-5 MByte depending on the number of 
particles in the event (1-1500). The described system 
serves since 2009 and recorded a large data set of 194 M 
physics quality collision events. Data taking is foreseen 
through at least 2019, with possible upgrades to the ToF 
detector. 
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